% CGRtools Tutorial % Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev % Mar 25, 2019
(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev
Installation instructions of CGRtools package information and tutorial's files see on https://github.com/cimm-kzn/CGRtools
NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.
In [ ]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:
print('WARNING. Tutorial was tested on 3.1 version of CGRtools')
else:
print('Welcome!')
In [ ]:
# load data for tutorial
from pickle import load
from traceback import format_exc
with open('molecules.dat', 'rb') as f:
molecules = load(f) # list of MoleculeContainer objects
with open('reactions.dat', 'rb') as f:
reactions = load(f) # list of ReactionContainer objects
m1, m2, m3, m4 = molecules # molecule
m7 = m3.copy()
m11 = m3.copy()
m11.standardize()
m7.standardize()
r1 = reactions[0] # reaction
m5 = r1.reactants[0]
m8 = m7.substructure([4, 5, 6, 7, 8, 9], as_view=False)
m10 = r1.products[0].copy()
CGRtools has subpackage containers with data structures classes:
In [ ]:
from CGRtools.containers import * # import all containers
Molecules are represented as undirected graphs. Molecules contain Atom objects and Bond objects.
Atom objects are represented as dictionary with unique number for each atom as key.
Bond objects are stored as sparse matrix with adjacent atoms pair as keys for rows and columns.
Hereafter, atom number is unique integer used to enumerate atoms in molecule. Please, don't confuse it with element number in Periodic Table, hereafter called element number.
Methods for molecule handling and the arguments of MoleculeContainer are described below.
In [ ]:
m1.meta # dictionary for molecule properties storage. For example, DTYPE/DATUM fields of SDF file are read into this dictionary
In [ ]:
m1 # MoleculeContainer supports depiction and graphic representation in Jupyter notebooks.
In [ ]:
m1.depict() # depiction returns SVG image in format string
In [ ]:
with open('molecule.svg', 'w') as f: # saving image to SVG file
f.write(m1.depict())
In [ ]:
m_copy = m1.copy() # isolated copy of molecule
m_copy
In [ ]:
len(m1) # get number of atoms in molecule
# or
m1.atoms_count
In [ ]:
m1.bonds_count # number of bonds
In [ ]:
m1.atoms_numbers # list of atoms numbers
In [ ]:
# this method calculates additional atoms attributes: number of neighbors and hybridization. See below for usage
m1.reset_query_marks() # by default this attributes are None (for speed-up)
m3.reset_query_marks()
The following notations are used for hybridization of atoms. Values are given as numbers below (in parenthesis symbols that are used in SMILES-like signatures are shown):
Neighbors and hybridizations atom attributes are required for substructure operations and structure standardization. See below
In [ ]:
# iterate over atoms using its numbers
list(m1.atoms()) # works the same as dict.items()
In [ ]:
# iterate over bonds using adjacent atoms numbers
list(m1.bonds())
In [ ]:
# access to atom by number
m1.atom(1)
In [ ]:
try:
m1.atom(10) # raise error for absent atom numbers
except KeyError:
print(format_exc())
In [ ]:
# access to bond using adjacent atoms numbers
m1.bond(1, 4)
In [ ]:
try:
m1.bond(1, 3) # raise error for absent bond
except KeyError:
print(format_exc())
In [ ]:
a = m1.atom(1)
# access to information
a.element # element symbol
# or
a['element']
In [ ]:
a.charge # formal charge
# or
a['charge']
In [ ]:
a.multiplicity # atom multiplicity. None if not set
# or
a['multiplicity']
In [ ]:
a.isotope # atom isotope. Default isotope if not set. Default isotopes are the same as used in InChI notation
# or
a['isotope']
In [ ]:
a.x # coordinates
a.y
a.z
# or
a['x']
a['y']
a['z']
In [ ]:
a.neighbors # Number of neighboring atoms is calculated by reset_query_marks method as shown above. It is read-only.
In [ ]:
a.hybridization # Atoms hybridization is calculated by reset_query_marks method as shown above. It is read-only.
In [ ]:
try:
a.hybridization = 2 # Not subsettable. Read-only! Thus error is raised.
except AttributeError:
print(format_exc())
In [ ]:
a.isotope = 16
# or
a['isotope'] = 16
In [ ]:
m1.flush_cache() # due to caching used for speed-up one needs to reset cache of molecule to observe changes in atoms and bonds
m1
In [ ]:
try:
a.isotope = 0 # raise error. Isotope with mass 0 could not exist
except ValueError:
print(format_exc())
In [ ]:
# bond objects also are dictionary-like classes which store information about bond order
b = m1.bond(3, 4)
b.order
#
b['order']
In [ ]:
b.order = 1 # order change also possible
# or
b['order'] = 1
In [ ]:
m1.flush_cache() #after flushing cashe one could see molecule with changes
m1
In [ ]:
try:
b['order'] = 0 # error raises. Bond order 0 is invalid. To delete bond use method described below
except ValueError:
print(format_exc())
One should to use delete_bond
method to break bond
In [ ]:
m1.delete_bond(3, 4)
m1 #Note that cashe flushing is not required here.
Method delete_atom
removes atom from the molecule
In [ ]:
m1.delete_atom(3)
m1
In [ ]:
m_copy # copy unchanged!
Atoms and bonds objects can be converted into integer representation that could be used to classify their types.
Atom type is represented by 21 bit code rounded to 32 bit integer number:
In [ ]:
int(a)
# 131596 == 0001000 000010000 011 00
# 0001000 == 8 Oxygen
# 000010000 == 16 isotope
# 011 == 3 (3 - 3 = 0) uncharged
# 00 == 0 hasn't multiplicity
In [ ]:
int(b) # bonds are encoded by their order
In [ ]:
print(m1.atom_implicit_h(1)) # get number of implicit hydrogens on atom 1
print(m1.atom_explicit_h(1)) # get number of explicit hydrogens on atom 1
print(m1.atom_total_h(1)) # get total number of hydrogens on atom 1
In [ ]:
m1
In [ ]:
m1.check_valence() # return list of numbers of atoms with invalid valences
In [ ]:
m4 # molecule with valence errors
In [ ]:
m4.check_valence()
In [ ]:
m3
In [ ]:
m3.sssr # Method for application of Smallest Set of Smallest Rings algorithm for rings
# identification. Returns list of lists of atoms forming smallest rings
In [ ]:
m2 # it's a salt represented as one graph
In [ ]:
m2.connected_components # list of lists of atoms belonging to graph components
In [ ]:
anion, cation = m2.split() # split molecule to components
In [ ]:
anion # graph of only one salt component
In [ ]:
cation # graph of only one salt component
Sometimes it is more convenient to represent salts as ion pair. Otherwise ambiguity could be introduced, for example in reaction of salt exchange:
Ag+ + NO3- + Na+ + Br- = Ag+ + Br- + Na+ + NO3-. Reactants and products sets are the same.
In this case one can combine anion-cation pair into single graph. It could be convenient way to represent other molecule mixtures.
In [ ]:
salt = anion | cation
# or
anion.union(cation)
salt # this graph has disconnected components, it is considered single compound now
By default, returned substructures are read-only projections of original molecule (except attributes of atoms/bonds).
Changes in original molecule (bond breaking/formation, atom insertion/deletion, atom/bond attributes changes) will be mirrored in projection.
Projections share the same neighbors and hybridization attributes as in initial molecule even if could be wrong for substructure
In [ ]:
proj = m3.substructure([4,5,6,7,8,9]) # projection of substructure
proj
In [ ]:
m3.atom(4).neighbors
In [ ]:
proj.atom(4).neighbors # same as in original molecule and not as it should be in substructure
In [ ]:
from networkx.exception import NetworkXError
try:
proj.reset_query_marks() # change of structure for projections is blocked
except NetworkXError:
print(format_exc())
In [ ]:
benzene = m3.substructure([4,5,6,7,8,9], as_view=False) # Substructure could be extracted as isolated graph (not projection)
benzene
In [ ]:
benzene.atom(4).neighbors is None # empty attribute. Substructure is a new molecule here. We need to call reset_query_marks
In [ ]:
benzene.reset_query_marks()
In [ ]:
benzene.atom(4).neighbors # now number of neighbors for atom 4 is 2. It is not 3 as above where projection was used.
Note:
In [ ]:
proj_copy = proj.copy() # turning projection into molecule using "copy" method
proj_copy
In [ ]:
proj_copy.reset_query_marks() # This not a projection any more but a new molecule
Changes in projection are mirrored. See example:
In [ ]:
m3.delete_bond(4, 5) # we've deleted bond
m3
In [ ]:
proj.flush_cache() # remove cached image. Note that in projection the bond is also deleted.
proj
augmented_substructure
is a substructure consisting from atoms and a given number of shells of neighboring atoms around it.
deep argument is a number of considered shells.
It also returns projection by default.
In [ ]:
aug = m3.augmented_substructure([10], deep=2) # atom 10 is Nitrogen
aug
In [ ]:
aug.atom(10).hybridization #atom has two incident double bond.
This functionality is used for canonic numbering of atoms in molecules. Prime number multiplication based Morgan algorithm is used for atom ranking. Property atoms_order
returns dictionary of atom numbers as keys and their ranks according to canonicalization as values. Equal rank mean that atoms are symmetric (are mapped to each other in automorhisms). In present version, instead of sequential ranks prime numbers are returned.
In [ ]:
m5.atoms_order
remap
method.This method is useful when it is needed to change order of atoms in molecules. First argument to remap
method is dictionary with existing atom numbers as keys and desired atom number as values. It is possible to change atom numbers for only part of atoms. Atom numbers could be non-sequencial but need to be unique.
If argument copy is set True new object will be created, else existing molecule will be changed. Default is False.
In [ ]:
for n, a in m5.atoms():
print(n, a.element)
for n, m, b in m5.bonds():
print(m, n, b.order)
In [ ]:
remapped = m5.remap({4:2}, copy=True)
remapped
In [ ]:
for n, a in remapped.atoms():
print(n, a.element)
for n, m, b in remapped.bonds():
print(m, n, b.order)
In [ ]:
r1 # depiction supported
In [ ]:
r1.meta
In [ ]:
print(r1.reactants, r1.products) # Access to lists of reactant and products. Molecules' signatures are returned by print() method.
reactant1, reactant2, reactant3 = r1.reactants
product = r1.products[0]
Reactions also has standardize
, aromatize
, reset_query_marks
, implicify_hydrogens
and explicify_hydrogens
methods (see part 3). These methods are applied independently to every molecule in reaction.
CGRContainer object is similar to MoleculeConrtainer, except some methods. The following methods are not suppoted for CGRContainer:
CGRContainer also has some methods absent in MoleculeConrtainer:
CGRContainer is undirected graph. Atoms and bonds in CGR has two states: reactant and product.
As mentioned above, atoms in MoleculeContainer have unique numbers. These numbers are used as atom-to-atom mapping in CGRtools upon CGR creation. Thus, atom order for molecules in reaction should correspond to atom-to-atom mapping.
Pair of molecules can be transformed into CGR. Notice that, the same atom numbers in reagents and products imply the same atoms.
Reaction also can be composed into CGR. Atom numbers of molecules in reaction are used as atom-to-atom mapping of reactants to products.
In [ ]:
cgr1 = m7 ^ m8 # CGR from molecules
# or
m7.compose(m8)
print(cgr1) # CGRcontainer object currently doesn't support depiction. CGR signature is printed out.
This is CGR (depiction is made by ChemAxon externally). You can see changed bonds connected to ring.
In [ ]:
r1
In [ ]:
cgr2 = ~r1 # CGR from reactions
# or
r1.compose()
print(cgr2) # signature is printed out.
It is CGR for reaction (depiction is made by ChemAxon externally).
In [ ]:
cgr2.reset_query_marks() # CGRs also has reset_query_marks method
In [ ]:
a = cgr2.atom(2) # atom access is the same as for MoleculeContainer
In [ ]:
a.element # element attribute
# or
a['element']
In [ ]:
a.isotope # isotope attribute
# or
a['isotope']
For CGRContainer attributes charge
, multiplicity
, x
, y
, z
, neighbors
and hybridization
refer to atom state in reactant of reaction; arguments p_charge
, p_multiplicity
, p_x
, p_y
, p_z
, p_neighbors
and p_hybridization
could be used to extract atom state in product part in reaction.
In [ ]:
a.charge # charge of atom in reactant
# or
a['charge']
In [ ]:
a.p_charge # charge of atom in product
#or
a['p_charge']
In [ ]:
a.p_multiplicity # multiplicity of atom in product. It is None and thus not returned
In [ ]:
a.p_x # coordinates of atom in product
a.p_y
a.p_z
In [ ]:
a.neighbors # number of neighbors of atom in reactant
In [ ]:
a.p_neighbors # number of neighbors of atom in product
In [ ]:
a.hybridization # hybridization of atom in reactant. 1 means only single bonds are incident to atom
In [ ]:
a.p_hybridization # hybridization of atom in product. 1 means only single bonds are incident to atom
In [ ]:
b = cgr1.bond(4, 10) # take bond
In [ ]:
b.order # bond order in reactant
In [ ]:
b.p_order is None # bond order in product in None
CGR can be decomposed back to reaction, i.e. reactants and products.
Notice that CGR can lose information in case of unbalanced reactions (where some atoms of reactant does not have counterpart in product, and vice versa). Decomposition of CGRs for unbalanced reactions back to reaction may lead to strange (and erroneous) structures.
In [ ]:
reactant_part, product_part = ~cgr1 # CGR of unbalanced reaction is decomposed back into reaction
# or
cgr1.decompose()
In [ ]:
reactant_part # reactants extracted. One can notice it is initial molecule
In [ ]:
product_part #extracted products. Originally benzene was the product.
For decomposition of CGRContainer back into ReactionContainer CGRpreparer
class can be used. CGRpreparer
is callable object
In [ ]:
from CGRtools import CGRpreparer # import of CGRpreparer
preparer = CGRpreparer() # initialization of CGRpreparer
In [ ]:
decomposed = preparer.decompose(cgr2) # decomposition of CGR2 into reaction
In [ ]:
decomposed # You can see that water absent in products initially was restored.
# This is a side-effect of CGR decomposing that could help with reaction balancing.
# But balancing using CGR decomposition works correctly only if minor part atoms are lost
# but multiplicity and formal charge are saved. In next release electronic state balansing will be added.
In [ ]:
r1 # compare with initial reaction
In [ ]:
from CGRtools.containers import*
In [ ]:
m10.reset_query_marks()
m10 # ether
In [ ]:
carb = m10.substructure([5,7,8, 2]) # extract projection of carboxyl fragment
carb
In [ ]:
q = QueryContainer(carb) # convert fragment into query
print(q) # QueryContainer don't support depiction yet but signatures can be extracted
CGRs also can be transformed into Query.
QueryCGRContainer
is similar to QueryContainer class for CGRs and has the same API.
QueryCGRContainer
take into account state of atoms and bonds in reactant and product, including neighbors and hybridization
In [ ]:
cgr1.reset_query_marks() # cgr1 is CGRContaier its labels could be calculated
cgr_q = QueryCGRContainer(cgr1) # transfrom CGRContainer into QueryCGRContainer
print(cgr_q) # print out signature of query
In [ ]:
from CGRtools.containers import *
In [ ]:
m = MoleculeContainer() # new empty molecule
m.add_atom('C') # add Carbon atom using element symbol
m.add_atom(6) # add Carbon atom using element number. {'element': 6} is not valid, but {'element': 'O'} is also acceptable
m.add_atom({'element': 'O', 'charge': -1}) # add negatively charged Oxygen atom. Similarly other atomic properties can be set
# add_atom has second argument for setting atom number.
# If not set, the next integer after the biggest among already created will be used.
m.add_atom({'element': 'Na', 'charge': 1}, 4)
In [ ]:
m.add_bond(1, 2, 1) # add bond with order = 1 between atoms 1 and 2
m.add_bond(3, 2, {'order': 1}) # the other possibility to set bond order
In [ ]:
m.calculate2d() #experimental function to calculate atom coordinates. Has number of flaws yet
m
Reactions can be constructed from molecules
In [ ]:
r = ReactionContainer() # empty reaction
r.reactants.append(m1) # add reactant
r.products.append(m11) # add product
# or
r = ReactionContainer(reactants=[m1], products=[m11]) # one-step way to construct reaction
# or
r = ReactionContainer([m1], [m11]) # first list of MoleculeContainers is interpreted as reactants, second one - as products
In [ ]:
r
In [ ]:
# reactants, products, reagents attributes are list-like.
r.products.append(m.copy()) # One can add (or remove) molecules directly to this list of products returned by r.products
r.flush_cache()
In [ ]:
r # coordinates will left unchanged. Thus depiction could look wrong.
In [ ]:
r.fix_positions() # this method fixes coordinates of molecules in reaction
r
QueryContainers can be constructed in the same way as MoleculeContainers.
Unlike other containers QueryContainers additionally support atoms, neighbors and hybridization lists.
In [ ]:
q = QueryContainer() # creation of empty container
q.add_atom({'element': ['N', 'O']}) # add atom list: N or O atom, any isotope and multiplicity, neutral,
# number of neighbors and hybridization are irrelevant
q.add_atom({'element': 'C', 'neighbors': [2, 3]}) # add carbon atom, any isotope and multiplicity, neutral,
# has 2 or 3 non-hydrogen heighbors
q.add_atom({'element': 'O'}) # add oxygen atom, any isotope and multiplicity, neutral,
# number of neighbors and hybridization are irrelevant
q.add_bond(1, 2, 1) # add single bond between atom 1 and 2
q.add_bond(2, 3, 2) # add double bond between atom 1 and 2
# any amide or carboxilic group will fit this query
In [ ]:
print(q) # print out signature (SMILES-like)
In [ ]:
from CGRtools.containers import MoleculeContainer
from CGRtools.attributes import Atom
In [ ]:
class MarkedAtom(Atom): # this class will inherite Atom class
__slots__ = '__mark' # all new attributes should be slotted!
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.__mark = None # set default value for added attribute
@property
def mark(self): # created new property
return self.__mark
@mark.setter
def mark(self, mark):
# do some checks and calculations
self.__mark = mark
In [ ]:
class MarkedMoleculeContainer(MoleculeContainer): # MarkedAtomContainer inherits MoleculeContainer
node_attr_dict_factory = MarkedAtom # override atom container
In [ ]:
m = MarkedMoleculeContainer() # create newly developed container MarkedMoleculeContainer
m.add_atom('C') # add atom C using methods inherited from MoleculeContainer
m.add_atom('N') # add atom N
m.add_bond(1, 2, 1) # add bond
m.atom(2)['mark'] = 1 # set mark on atom. In this example dictinary setitem supported but update not.
In [ ]:
m.atom(2).mark # one can return mark